首页> 外文OA文献 >Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management
【2h】

Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management

机译:HpC上的Hadoop:集成Hadoop和基于pilot的动态资源   管理

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

High-performance computing platforms such as supercomputers havetraditionally been designed to meet the compute demands of scientificapplications. Consequently, they have been architected as producers and notconsumers of data. The Apache Hadoop ecosystem has evolved to meet therequirements of data processing applications and has addressed many of thelimitations of HPC platforms. There exist a class of scientific applicationshowever, that need the collective capabilities of traditional high-performancecomputing environments and the Apache Hadoop ecosystem. For example, thescientific domains of bio-molecular dynamics, genomics and network science needto couple traditional computing with Hadoop/Spark based analysis. Weinvestigate the critical question of how to present the capabilities of bothcomputing environments to such scientific applications. Whereas this questionsneeds answers at multiple levels, we focus on the design of resource managementmiddleware that might support the needs of both. We propose extensions to thePilot-Abstraction to provide a unifying resource management layer. This is animportant step that allows applications to integrate HPC stages (e.g.simulations) to data analytics. Many supercomputing centers have started toofficially support Hadoop environments, either in a dedicated environment or inhybrid deployments using tools such as myHadoop. This typically involves manyintrinsic, environment-specific details that need to be mastered, and oftenswamp conceptual issues like: How best to couple HPC and Hadoop applicationstages? How to explore runtime trade-offs (data localities vs. data movement)?This paper provides both conceptual understanding and practical solutions tothe integrated use of HPC and Hadoop environments.
机译:传统上设计了高性能计算平台(例如超级计算机)来满足科学应用程序的计算需求。因此,它们被设计为数据的生产者而不是消费者。 Apache Hadoop生态系统已经发展到可以满足数据处理应用程序的需求,并且已经解决了HPC平台的许多局限性。显示出一类科学应用程序,它需要传统高性能计算环境和Apache Hadoop生态系统的集体能力。例如,生物分子动力学,基因组学和网络科学的科学领域需要将传统计算与基于Hadoop / Spark的分析结合起来。我们研究了如何在这样的科学应用中展示两种计算环境的能力的关键问题。尽管这个问题需要在多个层面上回答,但我们专注于资源管理中间件的设计,该中间件可能会同时满足这两种需求。我们建议对Pilot-Abstraction进行扩展,以提供统一的资源管理层。这是重要的步骤,允许应用程序将HPC阶段(例如模拟)集成到数据分析中。许多超级计算中心已经开始正式支持Hadoop环境,无论是在专用环境中还是使用诸如myHadoop之类的工具进行混合部署。这通常涉及许多固有的,特定于环境的细节,需要掌握这些细节,并且经常淹没一些概念性问题,例如:如何最好地结合HPC和Hadoop应用程序阶段?如何探索运行时的权衡(数据局部性与数据移动性)?本文提供了对HPC和Hadoop环境的集成使用的概念性理解和实用解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号